The present study will examine young adults from the National Epidemiologic Survey of Alcohol and Related Conditions (NESARC). The goals of the analysis will include 1) establishing the relationship between major depression and nicotine dependence; and 2) determining whether or not the relationship between major depression and nicotine dependence exists above and beyond smoking quantity.
Variables from NESARC that will be used include: MAJORDEPLIFE (has subject experienced a major depression?), CHECK321 (has subject smoked cigarettes in past 12 months?), AGE (age of subject), TAB12MDX (tobacco dependence past 12 months), S3AQ3B1 (usual frequency when cigarettes smoked), ETHRACE2A (ethnicity of subject), SEX (gender of subject), and S3AQ3C1 (usual smoking quantity of cigarettes when smoked). 
First, the data is placed on the search path using the PDS package. Next, a subset of people 25 or younger who smoked in the last 12 months is created using the filter function from dplyr. Note that CHECK321 == 1 is used to see if subject has smoked in the past 12 months and !is.na() is used to make sure the subset does not contain NAs for the chosen variables. Last, the variables of interest are selected and stored in the data frame nesarc.sub using the select function from the dplyr package.
library(PDS)
library(dplyr)
nesarc.sub <- NESARC %>%
filter(!is.na(CHECK321) & !is.na(AGE) & CHECK321 ==1 & AGE <= 25)
dim(nesarc.sub)
[1] 1706 3008
nesarc.sub <- nesarc.sub %>%
select(MAJORDEPLIFE, CHECK321, AGE, TAB12MDX, S3AQ3B1, S3AQ3C1, ETHRACE2A, SEX)
dim(nesarc.sub)
[1] 1706 8
summary(nesarc.sub)
MAJORDEPLIFE CHECK321 AGE TAB12MDX S3AQ3B1 S3AQ3C1
0:1260 1:1706 Min. :18.00 0:810 1:1320 Min. : 1.0
1: 446 2: 0 1st Qu.:20.00 1:896 2: 68 1st Qu.: 5.0
9: 0 Median :22.00 3: 91 Median :10.0
Mean :21.61 4: 88 Mean :11.8
3rd Qu.:23.00 5: 65 3rd Qu.:20.0
Max. :25.00 6: 71 Max. :99.0
9: 3
ETHRACE2A SEX
1:1060 1:856
2: 211 2:850
3: 42
4: 58
5: 335
Running summary on the nesarc.sub reveals some non-obvious categories for the factors CHECK321 and S3AQ3B1. Reviewing the Code Book (HINT: Inside a pdf use shift F (Windows) or command F (Mac) then type the variable name inside the box to find the variable in the pdf), it is noted that a 9 is used to indicate missing values for S3AQ3B1 and a 99 is used to indicate missing values for S3AQ3C1.
The variable S3AQ3B1 uses a 9 to record unknown for smoking frequency and a 99 is used to record unknown for S3AQ3C1.
nesarc.sub$S3AQ3B1[nesarc.sub$S3AQ3B1 == 9] <- NA
summary(nesarc.sub$S3AQ3B1) # Note that 9 still appears
1 2 3 4 5 6 9 NA's
1320 68 91 88 65 71 0 3
nesarc.sub$S3AQ3B1 <- factor(nesarc.sub$S3AQ3B1)[, drop = TRUE]
summary(nesarc.sub$S3AQ3B1) # Unused level no longer appears
1 2 3 4 5 6 NA's
1320 68 91 88 65 71 3
nesarc.sub$S3AQ3C1[nesarc.sub$S3AQ3C1 == 99] <- NA
summary(nesarc.sub$S3AQ3C1)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.00 5.00 10.00 11.34 20.00 98.00 9
nesarc.sub$CHECK321[nesarc.sub$CHECK321 == 9] <- NA
nesarc.sub$CHECK321 <- factor(nesarc.sub$CHECK321)[ , drop = TRUE]
summary(nesarc.sub)
MAJORDEPLIFE CHECK321 AGE TAB12MDX S3AQ3B1
0:1260 1:1706 Min. :18.00 0:810 1 :1320
1: 446 1st Qu.:20.00 1:896 2 : 68
Median :22.00 3 : 91
Mean :21.61 4 : 88
3rd Qu.:23.00 5 : 65
Max. :25.00 6 : 71
NA's: 3
S3AQ3C1 ETHRACE2A SEX
Min. : 1.00 1:1060 1:856
1st Qu.: 5.00 2: 211 2:850
Median :10.00 3: 42
Mean :11.34 4: 58
3rd Qu.:20.00 5: 335
Max. :98.00
NA's :9
To estimate the total number of cigarettes a subject smokes per month, convert S3AQ3B1 (a factor with 6 levels) to a numeric variable using as.numeric. DaysSmoke estimates the days per month a subject smokes. The variable TotalCigsSmoked estimates the monthly number of cigarettes a subject smokes per month by multiplying DaysSmoke times S3AQ3C1 (the usual quantity smoked per day). The numeric variable TotalCigsSmoked is converted into a factor with roughly equivalent numbers stored in each level of the factor CigsSmokedFac using the cut function.
nesarc.sub$DaysSmoke <- as.numeric(nesarc.sub$S3AQ3B1)
nesarc.sub$DaysSmoke[nesarc.sub$DaysSmoke == 1] <- 30
nesarc.sub$DaysSmoke[nesarc.sub$DaysSmoke == 2] <- 4*5.5
nesarc.sub$DaysSmoke[nesarc.sub$DaysSmoke == 3] <- 4*3.5
nesarc.sub$DaysSmoke[nesarc.sub$DaysSmoke == 4] <- 4*1.5
nesarc.sub$DaysSmoke[nesarc.sub$DaysSmoke == 5] <- 2.5
nesarc.sub$DaysSmoke[nesarc.sub$DaysSmoke == 6] <- 1
# Using dplyr again
nesarc.sub <- nesarc.sub %>%
mutate(TotalCigsSmoked = DaysSmoke*S3AQ3C1)
proportions <- quantile(nesarc.sub$TotalCigsSmoked, na.rm = TRUE)
proportions
0% 25% 50% 75% 100%
1 90 300 600 2940
nesarc.sub$CigsSmokedFac <- cut(nesarc.sub$TotalCigsSmoked, breaks = proportions, include.lowest = TRUE)
head(nesarc.sub)
MAJORDEPLIFE CHECK321 AGE TAB12MDX S3AQ3B1 S3AQ3C1 ETHRACE2A SEX
1 0 1 25 1 1 3 2 2
2 1 1 21 1 2 3 5 1
3 1 1 24 1 1 10 1 2
4 1 1 23 1 1 10 1 2
5 0 1 21 0 1 20 1 1
6 1 1 23 0 1 5 1 2
DaysSmoke TotalCigsSmoked CigsSmokedFac
1 30 90 [1,90]
2 22 66 [1,90]
3 30 300 (90,300]
4 30 300 (90,300]
5 30 600 (300,600]
6 30 150 (90,300]
In the next section of code, variables are labeled and levels of factors are given informative labels. The order of the levels is also rearranged for the variables S3AQ3B1, TAB12MDX, and SEX.
library(Hmisc)
label(nesarc.sub$TAB12MDX) <- "Tobacco Dependence past 12 Months"
label(nesarc.sub$CHECK321) <- "Smoked Cigarettes in the Past 12 Months"
label(nesarc.sub$S3AQ3B1) <- "Usual Smoking Frequency"
label(nesarc.sub$S3AQ3C1) <- "Usual Smoking Quantity"
nesarc.sub$MAJORDEPLIFE <- factor(nesarc.sub$MAJORDEPLIFE,
labels = c("No Depression", "Yes Depression"))
nesarc.sub$S3AQ3B1 <- factor(nesarc.sub$S3AQ3B1,
labels = c("Every Day", "5 to 6 Days/week",
"3 to 4 Days/week", "1 to 2 Days/week",
"2 to 3 Days/month",
"Once a month or less"))
nesarc.sub$S3AQ3B1 <- factor(nesarc.sub$S3AQ3B1,
levels = c("Once a month or less", "2 to 3 Days/month",
"1 to 2 Days/week", "3 to 4 Days/week",
"5 to 6 Days/week", "Every Day"))
nesarc.sub$TAB12MDX <- factor(nesarc.sub$TAB12MDX,
labels = c("No Nicotine Dependence", "Nicotine Dependence"))
nesarc.sub$TAB12MDX <- factor(nesarc.sub$TAB12MDX,
levels = c("Nicotine Dependence", "No Nicotine Dependence"))
nesarc.sub$ETHRACE2A <- factor(nesarc.sub$ETHRACE2A,
labels = c("Caucasian", "African American",
"Native American", "Asian", "Hispanic"))
nesarc.sub$SEX <- factor(nesarc.sub$SEX, labels = c("Male", "Female"))
table(nesarc.sub$SEX)
Male Female
856 850
nesarc.sub$SEX <- factor(nesarc.sub$SEX, levels = c("Female", "Male"))
summary(nesarc.sub)
MAJORDEPLIFE CHECK321 AGE
No Depression :1260 1:1706 Min. :18.00
Yes Depression: 446 1st Qu.:20.00
Median :22.00
Mean :21.61
3rd Qu.:23.00
Max. :25.00
TAB12MDX S3AQ3B1 S3AQ3C1
Nicotine Dependence :896 Once a month or less: 71 Min. : 1.00
No Nicotine Dependence:810 2 to 3 Days/month : 65 1st Qu.: 5.00
1 to 2 Days/week : 88 Median :10.00
3 to 4 Days/week : 91 Mean :11.34
5 to 6 Days/week : 68 3rd Qu.:20.00
Every Day :1320 Max. :98.00
NA's : 3 NA's :9
ETHRACE2A SEX DaysSmoke TotalCigsSmoked
Caucasian :1060 Female:850 Min. : 1.00 Min. : 1.0
African American: 211 Male :856 1st Qu.:30.00 1st Qu.: 90.0
Native American : 42 Median :30.00 Median : 300.0
Asian : 58 Mean :25.07 Mean : 319.4
Hispanic : 335 3rd Qu.:30.00 3rd Qu.: 600.0
Max. :30.00 Max. :2940.0
NA's :3 NA's :9
CigsSmokedFac
[1,90] :429
(90,300] :678
(300,600] :503
(600,2.94e+03]: 87
NA's : 9
nesarc.sub <- nesarc.sub %>%
rename(Depression = MAJORDEPLIFE, TobaccoDependence = TAB12MDX,
SmokingFreq = S3AQ3B1, DailyCigsSmoked = S3AQ3C1, Ethnicity = ETHRACE2A,
Sex = SEX, Age = AGE)
head(nesarc.sub)
Depression CHECK321 Age TobaccoDependence SmokingFreq
1 No Depression 1 25 Nicotine Dependence Every Day
2 Yes Depression 1 21 Nicotine Dependence 5 to 6 Days/week
3 Yes Depression 1 24 Nicotine Dependence Every Day
4 Yes Depression 1 23 Nicotine Dependence Every Day
5 No Depression 1 21 No Nicotine Dependence Every Day
6 Yes Depression 1 23 No Nicotine Dependence Every Day
DailyCigsSmoked Ethnicity Sex DaysSmoke TotalCigsSmoked
1 3 African American Female 30 90
2 3 Hispanic Male 22 66
3 10 Caucasian Female 30 300
4 10 Caucasian Female 30 300
5 20 Caucasian Male 30 600
6 5 Caucasian Female 30 150
CigsSmokedFac
1 [1,90]
2 [1,90]
3 (90,300]
4 (90,300]
5 (300,600]
6 (90,300]
Basic tables can be created using the functions table or xtabs. Frequency tables are created for the variables TobaccoDependence, CigsSmokedFac, and SmokingFreq.
T1 <- xtabs(~TobaccoDependence, data = nesarc.sub)
T2 <- xtabs(~CigsSmokedFac, data = nesarc.sub)
T3 <- xtabs(~SmokingFreq, data = nesarc.sub)
table(nesarc.sub$TobaccoDependence)
Nicotine Dependence No Nicotine Dependence
896 810
T1
TobaccoDependence
Nicotine Dependence No Nicotine Dependence
896 810
T2
CigsSmokedFac
[1,90] (90,300] (300,600] (600,2.94e+03]
429 678 503 87
T3
SmokingFreq
Once a month or less 2 to 3 Days/month 1 to 2 Days/week
71 65 88
3 to 4 Days/week 5 to 6 Days/week Every Day
91 68 1320
In the data frame nesarc.sub, there are 896 nicotine dependent subjects and 810 subjects that are not nicotine dependent. A small number of smokers (87) smoke over 600 cigarettes per month. Most of the subjects in nesarc.sub are daily smokers (1320) with the remainder distributed uniformly across the first five levels of SmokingFreq.
The barplots are all created with the package ggplot2. The barplots start with the defaults for the geom_bar and add more detail to the plot with each graph.
library(ggplot2)
g1 <- ggplot(data = nesarc.sub, aes(x = TobaccoDependence)) +
geom_bar()
g1
<<<<<<< HEAD
g2 <- ggplot(data = na.omit(nesarc.sub), aes(x = CigsSmokedFac)) +
geom_bar(fill = "brown")
g2
g2 <- ggplot(data = na.omit(nesarc.sub), aes(x = CigsSmokedFac)) +
geom_bar(fill = "brown")
g2
g3 <- ggplot(data = na.omit(nesarc.sub), aes(SmokingFreq)) +
geom_bar(fill = "gray") +
labs(x = "", y = "Total Number", title = "Smoking Frequency for Young Adults") +
theme_bw() +
theme(axis.text.x = element_text(angle = 75, vjust = 0.5))
g3
<<<<<<< HEAD
One popular way to graph a continuous variable is to use a histogram. R has the base function hist which can be used for this purpose. We will also use the package ggplot2 to create histograms. We start with a basic histogram of the variable TotalCigsSmoked.
hist(nesarc.sub$TotalCigsSmoked)
<<<<<<< HEAD
Experiment with the code in the previous code chunk to change the color, the title, and if needed the labels on the \(x\)- and \(y\)-axes.
ggplot(data = nesarc.sub, aes(x = TotalCigsSmoked)) +
geom_histogram(binwidth = 200, fill = "gray") +
labs(x = "Estimated Cigarettes Smoked per Month") +
theme_bw()
<<<<<<< HEAD
ggplot(data = nesarc.sub, aes(x = TotalCigsSmoked)) +
geom_density(fill = "sienna") +
labs(x = "Estimated Cigarettes Smoked per Month") +
theme_bw()
<<<<<<< HEAD
ggplot(data = nesarc.sub, aes(x = DailyCigsSmoked)) +
geom_density(fill = "slategrey") +
labs(x = "Reported Cigarettes Smoked per Day") +
theme_bw()
<<<<<<< HEAD
summary(nesarc.sub$TotalCigsSmoked)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.0 90.0 300.0 319.4 600.0 2940.0 9
fivenum(nesarc.sub$TotalCigsSmoked)
[1] 1 90 300 600 2940
median(nesarc.sub$TotalCigsSmoked, na.rm = TRUE)
[1] 300
IQR(nesarc.sub$TotalCigsSmoked, na.rm = TRUE)
[1] 510
The shape of the distribution for the estimated cigarettes smoked per month is skewed to the right. The center (median) of the distribution is 300 and the spread (interquartile range) for the distribution is 510.
Consider using depression status (Depression) to predict nicotine dependence (TobaccoDependence).
T1 <- xtabs(~ TobaccoDependence + Depression, data = nesarc.sub)
T1
Depression
TobaccoDependence No Depression Yes Depression
Nicotine Dependence 574 322
No Nicotine Dependence 686 124
T2 <- prop.table(T1, 2)
T2
Depression
TobaccoDependence No Depression Yes Depression
Nicotine Dependence 0.4555556 0.7219731
No Nicotine Dependence 0.5444444 0.2780269
barplot(T2)
<<<<<<< HEAD
The last graph needs a legend as well as a title. While it is possible to construct a legend and title for the last graph, it is much better to use an approach that will generate the legend automatically.
g1 <- ggplot(data = nesarc.sub, aes(x = Depression, fill = TobaccoDependence)) +
geom_bar(position = "fill") +
theme_bw()
g1
<<<<<<< HEAD
Tweaking the last graph.
g1 + labs(x = "", y = "",
title = "Fraction of young adult smokers with and without\n nicotine dependence by depression status") +
scale_fill_discrete(name="Tobacco Addiction\nStatus")
<<<<<<< HEAD
g2 <- ggplot(data = na.omit(nesarc.sub), aes(x = SmokingFreq, fill = TobaccoDependence)) +
geom_bar(position = "fill") +
theme_bw()
g2
<<<<<<< HEAD
g3 <- g2 + facet_grid(Sex ~ .) +
theme(axis.text.x = element_text(angle = 85, vjust = 0.5)) +
labs(x = "", y = "", title = "Fraction of young adult smokers with and without\n nicotine dependence by smoking frequency") +
scale_fill_discrete(name="Tobacco Addiction\nStatus")
g3
<<<<<<< HEAD
ggplot(data = na.omit(nesarc.sub), aes(x = CigsSmokedFac, fill = TobaccoDependence)) +
geom_bar(position = "fill") +
theme_bw() + facet_grid(Sex ~ Depression) +
theme(axis.text.x = element_text(angle = 85, vjust = 0.5)) +
labs(x = "Estimated Number of Cigarettes Smoked per Month", y = "", title = "Fraction of young adult smokers with and without\n nicotine dependence by smoking quantity") +
scale_fill_discrete(name="Tobacco Addiction\nStatus")
<<<<<<< HEAD
library(vcd)
mosaic(~TobaccoDependence + Depression, data = nesarc.sub)
<<<<<<< HEAD
mosaic(~TobaccoDependence + Depression, data = nesarc.sub, shade = TRUE)
mosaic(~TobaccoDependence + Depression + Sex, data = nesarc.sub, shade = TRUE)
mosaic(~TobaccoDependence + Depression + Sex + CigsSmokedFac, data = nesarc.sub, shade = TRUE)
mosaic(~TobaccoDependence + Depression, data = nesarc.sub, shade = TRUE)
mosaic(~TobaccoDependence + Depression + Sex, data = nesarc.sub, shade = TRUE)
mosaic(~TobaccoDependence + Depression + Sex + CigsSmokedFac, data = nesarc.sub, shade = TRUE)
Note nesarc.sub is not the same data set discussed in the book. In the book, the subset is smaller since it includes only people who have smoked every day (S3AQ41). This subset has \((n = 1706)\) compared to the one in the book which has \(n = 1320\).
ggplot(data = nesarc.sub, aes(x = DailyCigsSmoked)) +
geom_density(fill = "gray") +
facet_grid(Depression~.) +
theme_bw()
<<<<<<< HEAD
tapply(nesarc.sub$DailyCigsSmoked, list(nesarc.sub$Depression), sd, na.rm = TRUE)
No Depression Yes Depression
8.545279 9.172743
ggplot(data = nesarc.sub, aes(x = Depression, y = DailyCigsSmoked, fill = Depression)) + geom_violin() +
theme_bw()
<<<<<<< HEAD
t.test(DailyCigsSmoked ~ Depression, data = nesarc.sub)
Welch Two Sample t-test
data: DailyCigsSmoked by Depression
t = -1.7158, df = 732.843, p-value = 0.08663
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.8312941 0.1231774
sample estimates:
mean in group No Depression mean in group Yes Depression
11.11891 11.97297
t.test(TotalCigsSmoked ~ Depression, data = nesarc.sub)
Welch Two Sample t-test
data: TotalCigsSmoked by Depression
t = -1.8353, df = 734.092, p-value = 0.06686
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-59.652230 2.007771
sample estimates:
mean in group No Depression mean in group Yes Depression
311.8771 340.6993
t.test(TotalCigsSmoked ~ Depression, data = nesarc.sub, var.equal = TRUE) # assumptions not really met here
Two Sample t-test
data: TotalCigsSmoked by Depression
t = -1.8965, df = 1695, p-value = 0.05807
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-58.6306829 0.9862242
sample estimates:
mean in group No Depression mean in group Yes Depression
311.8771 340.6993
summary(aov(TotalCigsSmoked ~ Depression, data = nesarc.sub))
Df Sum Sq Mean Sq F value Pr(>F)
Depression 1 272337 272337 3.597 0.0581 .
Residuals 1695 128346537 75721
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
9 observations deleted due to missingness
# OR
mod <- lm(TotalCigsSmoked ~ Depression, data = nesarc.sub)
anova(mod)
Analysis of Variance Table
Response: TotalCigsSmoked
Df Sum Sq Mean Sq F value Pr(>F)
Depression 1 272337 272337 3.5966 0.05807 .
Residuals 1695 128346537 75721
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
You should add your own notes here when I explain this in class.
set.seed(123)
SIMS <- 10000 - 1
stat <- numeric(SIMS)
TS <- t.test(DailyCigsSmoked ~ Depression, data = nesarc.sub)$stat
for(i in 1:SIMS){
stat[i] <- t.test(DailyCigsSmoked ~ sample(Depression), data = nesarc.sub)$stat
}
ggplot(data = data.frame(stat = stat), aes(x = stat)) +
geom_density(fill = "pink") +
theme_bw() +
stat_function(fun = dt, arg = list(df = 733), color = "red", linetype = "dashed") + geom_vline(x = c(TS, -TS), linetype = "dashed", color = "blue")
<<<<<<< HEAD
TS
t
-1.715751
pv <- (sum(stat >= abs(TS))*2 + 1) / SIMS
pv
[1] 0.09690969
# ANOVA
set.seed(123)
SIMS <- 10000 - 1
stat <- numeric(SIMS)
TSF <- anova(lm(DailyCigsSmoked ~ Depression, data = nesarc.sub))[1, 4]
for(i in 1:SIMS){
stat[i] <- anova(lm(DailyCigsSmoked ~ sample(Depression), data = nesarc.sub))[1, 4]
}
ggplot(data = data.frame(stat = stat), aes(x = stat)) +
geom_density(fill = "pink") +
theme_bw() +
stat_function(fun = df, arg = list(df1 = 1, df2 = 1695), color = "red", linetype = "dashed") + geom_vline(x = TSF, linetype = "dashed", color = "blue") + ylim(0, 1)
<<<<<<< HEAD
TSF
[1] 3.149407
pvF <- (sum(stat >= TSF) + 1) / SIMS
pvF
[1] 0.07370737
<<<<<<< HEAD
=======
ggplot(data = nesarc.sub, aes(x = Ethnicity, y = TotalCigsSmoked, fill = Ethnicity)) +
geom_boxplot() +
guides(fill = FALSE) +
theme_bw() +
labs(x = "", y = "Estimated Number of Cigarettes Smoked per Month\n")
ggplot(data = nesarc.sub, aes(x = Ethnicity, y = TotalCigsSmoked, fill = Ethnicity)) +
geom_violin() +
guides(fill = FALSE) +
theme_bw() +
labs(x = "", y = "Estimated Number of Cigarettes Smoked per Month\n")
\(H_0:\mu_{\text{Caucassian}} = \mu_{\text{African American}}=\mu_{\text{Native American}} =\mu_{\text{Asian}} =\mu_{\text{Hispanic}}\) versus \(H_1:\mu_{\text{Caucassian}} \neq \mu_{\text{African American}} \neq \mu_{\text{Native American}} \neq \mu_{\text{Asian}} \neq \mu_{\text{Hispanic}}\)
eth.aov <- aov(TotalCigsSmoked ~ Ethnicity, data = nesarc.sub)
summary(eth.aov)
Df Sum Sq Mean Sq F value Pr(>F)
Ethnicity 4 7012102 1753026 24.39 <2e-16 ***
Residuals 1692 121606772 71872
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
9 observations deleted due to missingness
CIs <- TukeyHSD(eth.aov)
CIs
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = TotalCigsSmoked ~ Ethnicity, data = nesarc.sub)
$Ethnicity
diff lwr upr
African American-Caucasian -110.06777 -165.38997 -54.74558
Native American-Caucasian -58.90587 -174.09624 56.28450
Asian-Caucasian -123.96375 -222.69939 -25.22811
Hispanic-Caucasian -149.57543 -195.59596 -103.55490
Native American-African American 51.16190 -72.58143 174.90524
Asian-African American -13.89598 -122.48842 94.69646
Hispanic-African American -39.50766 -104.01720 25.00189
Asian-Native American -65.05788 -213.38385 83.26809
Hispanic-Native American -90.66956 -210.54348 29.20436
Hispanic-Asian -25.61168 -129.77339 78.55003
p adj
African American-Caucasian 0.0000006
Native American-Caucasian 0.6300505
Asian-Caucasian 0.0056089
Hispanic-Caucasian 0.0000000
Native American-African American 0.7911826
Asian-African American 0.9968050
Hispanic-African American 0.4514196
Asian-Native American 0.7527322
Hispanic-Native American 0.2356840
Hispanic-Asian 0.9625311
opar <- par(no.readonly = TRUE) # copy of current settings
par(mar =c(1, 15, 2, 1))
plot(CIs, las = 1)
par(opar) # restore original settings
Evidence (p = 1.190459710^{-19}) suggests a strong relationship between ethnicity and the estimated number of cigarettes an individual smokes per month. Post hoc comparisons suggest mean differences in the number of cigarettes smoked per month for African American and Caucasian, Asian and Caucasian, and Hispanic and Caucasian.
library(vcd)
mosaic(~TobaccoDependence + Depression, data = nesarc.sub, shade = TRUE)
xtabs(~TobaccoDependence + Depression, data = nesarc.sub)
Depression
TobaccoDependence No Depression Yes Depression
Nicotine Dependence 574 322
No Nicotine Dependence 686 124
T1 <- prop.table(xtabs(~TobaccoDependence + Depression, data = nesarc.sub), 1)
T1
Depression
TobaccoDependence No Depression Yes Depression
Nicotine Dependence 0.6406250 0.3593750
No Nicotine Dependence 0.8469136 0.1530864
CR <- chisq.test(nesarc.sub$TobaccoDependence, nesarc.sub$Depression)
CR
Pearson's Chi-squared test with Yates' continuity correction
data: nesarc.sub$TobaccoDependence and nesarc.sub$Depression
X-squared = 92.6945, df = 1, p-value < 2.2e-16
A Chi Square test of independence suggest that among young adult smokers, those with past year nicotine dependence were more likely to have experiences major depression in thier lifetime (35.9375%) compared to those withou past year nicotine dependence (15.308642%), \(\chi^2 = 92.6945346\), 1 df, p < 0.0001.
>>>>>>> dc4cb233a02691157d230500c5b52ee09dd78410library(moonBook)
out <- mytable(Depression + TobaccoDependence ~ Age + SmokingFreq + DailyCigsSmoked + Ethnicity + Sex +DaysSmoke + TotalCigsSmoked + CigsSmokedFac , data = nesarc.sub)
myhtml(out)
| Depression | No Depression | Yes Depression | ||||
|---|---|---|---|---|---|---|
| TobaccoDependence |
Nicotine Dependence (N=574) |
No Nicotine Dependence (N=686) |
p |
Nicotine Dependence (N=322) |
No Nicotine Dependence (N=124) |
p |
| Age | 21.5 ± 2.3 | 21.7 ± 2.1 | 0.145 | 21.5 ± 2.2 | 21.8 ± 2.2 | 0.205 |
| SmokingFreq | 0.000 | 0.000 | ||||
| Once a month or less | 6 ( 1.0%) | 43 ( 6.3%) | 1 ( 0.3%) | 21 (16.9%) | ||
| 2 to 3 Days/month | 8 ( 1.4%) | 41 ( 6.0%) | 4 ( 1.2%) | 12 ( 9.7%) | ||
| 1 to 2 Days/week | 12 ( 2.1%) | 60 ( 8.8%) | 7 ( 2.2%) | 9 ( 7.3%) | ||
| 3 to 4 Days/week | 22 ( 3.8%) | 50 ( 7.3%) | 10 ( 3.1%) | 9 ( 7.3%) | ||
| 5 to 6 Days/week | 16 ( 2.8%) | 34 ( 5.0%) | 11 ( 3.4%) | 7 ( 5.6%) | ||
| Every Day | 510 (88.9%) | 455 (66.6%) | 289 (89.8%) | 66 (53.2%) | ||
| DailyCigsSmoked | 13.6 ± 9.0 | 9.0 ± 7.5 | 0.000 | 13.8 ± 9.5 | 7.3 ± 6.1 | 0.000 |
| Ethnicity | 0.000 | 0.053 | ||||
| Caucasian | 377 (65.7%) | 402 (58.6%) | 214 (66.5%) | 67 (54.0%) | ||
| African American | 77 (13.4%) | 86 (12.5%) | 27 ( 8.4%) | 21 (16.9%) | ||
| Native American | 12 ( 2.1%) | 9 ( 1.3%) | 15 ( 4.7%) | 6 ( 4.8%) | ||
| Asian | 25 ( 4.4%) | 20 ( 2.9%) | 10 ( 3.1%) | 3 ( 2.4%) | ||
| Hispanic | 83 (14.5%) | 169 (24.6%) | 56 (17.4%) | 27 (21.8%) | ||
| Sex | 0.633 | 0.031 | ||||
| Female | 263 (45.8%) | 304 (44.3%) | 194 (60.2%) | 89 (71.8%) | ||
| Male | 311 (54.2%) | 382 (55.7%) | 128 (39.8%) | 35 (28.2%) | ||
| DaysSmoke | 0.000 | 0.000 | ||||
| 1 | 18 ( 3.1%) | 103 (15.1%) | 8 ( 2.5%) | 30 (24.2%) | ||
| 2.5 | 8 ( 1.4%) | 41 ( 6.0%) | 4 ( 1.2%) | 12 ( 9.7%) | ||
| 14 | 22 ( 3.8%) | 50 ( 7.3%) | 10 ( 3.1%) | 9 ( 7.3%) | ||
| 22 | 16 ( 2.8%) | 34 ( 5.0%) | 11 ( 3.4%) | 7 ( 5.6%) | ||
| 30 | 510 (88.9%) | 455 (66.6%) | 289 (89.8%) | 66 (53.2%) | ||
| TotalCigsSmoked | 394.9 ± 280.5 | 241.7 ± 239.6 | 0.000 | 403.9 ± 294.2 | 175.7 ± 197.0 | 0.000 |
| CigsSmokedFac | 0.000 | 0.000 | ||||
| [1,90] | 69 (12.0%) | 258 (38.0%) | 40 (12.5%) | 62 (50.4%) | ||
| (90,300] | 241 (42.0%) | 262 (38.6%) | 132 (41.1%) | 43 (35.0%) | ||
| (300,600] | 224 (39.0%) | 139 (20.5%) | 122 (38.0%) | 18 (14.6%) | ||
| (600,2.94e+03] | 40 ( 7.0%) | 20 ( 2.9%) | 27 ( 8.4%) | 0 ( 0.0%) | ||